Patent SMT Based on Combined Phrases for NTCIR-7

نویسندگان

  • Junguo Zhu
  • Haoliang Qi
  • Muyun Yang
  • Jufeng Li
  • Sheng Li
چکیده

In this paper, we describe a combined phrase approach to the Statistical Machine Translation of Japanese patents into English. To resolve the segmentation errors caused by the rich OOV (out-of-vocabulary) words in the patent texts, the character based translation phrases are first employed. Then the word based translation phrases are established to utilize the dependable word level information. Finally the two translation phrases tables are linearly combined to capture both character and word level translation correspondences. Preliminary experiments on NTCIR-7 corpus indicate that the BLEU scores of the proposed method significantly out-perform the usual word based approach.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Phrase Combination Approach to Patent SMT

This paper presents a phrase combination approach to patent SMT (Statistical Machine Translation) for Japanese to English. To minimize the segmentation problems caused by the rich OOV (out-ofvocabulary) words in the patent texts, the character based translation phrases are first introduced to avoid the segmentation errors in translation modeling. Then the word based translation phrases, which a...

متن کامل

The POSTECH Statistical Machine Translation Systems for NTCIR-7 Patent Translation Task

This paper describes the POSTECH statistical machine translation (SMT) systems for the NTCIR-7 patent translation task. We entered two patent translation subtasks: Japanese-to-English (KLE-je), and English-toJapanese translation (KLE-ej). The baseline systems are derived from a common phrase-based SMT framework. In addition, for Japanese-to-English translation, we adopted two kinds of methods. ...

متن کامل

System Description of NiCT-ATR SMT for NTCIR-7

In this paper we propose a method to improve SMT based patent translatioin. This method first employs International Patent Classification to build class based models. Then, multiple models are interpolated by weighting method employing source side language models. We carried out experiments using data from the patent translation task of NTCIR-7 workshop. According to the experimental results, t...

متن کامل

NTT SMT System 2008 at NTCIR-7

This paper describes NTT SMT System 2008 presented at the patent translation task (PAT-MT) in NTCIR-7. For PAT-MT, we submitted our strong baseline system faithfully following a hierarchical phrasebased statistical machine translation [2]. The hierarchical phrase-based SMT is based on a synchronousCFGs in which a paired source/target rules are synchronously applied starting from the initial sym...

متن کامل

Statistical Machine Translation with Terminology

This paper considers a scenario which is slightly different from Statistical Machine Translation (SMT) in that we are given almost perfect knowledge about bilingual terminology, considering the situation when a Japanese patent is applied to or granted by the Japanese Patent Office (JPO). Technically, we incorporate bilingual terminology into Phrase-based SMT (PB-SMT) focusing on the statistical...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008